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1. Introduction. Consider the nonlinear state space model, where the state 
process {Xt}t>o is a Markov chain on some general state space (X,;S(X)) hav- 
ing initial distribution x transition kernel Q. The state process is hidden 
but partially observed through the observations {Yt}t>o, which are Y-valued 
random variables being independent conditionally on the latent state sequence 
{Xt}t>o', in addition, there exists a cr-finite measure A on {Y,B{Y)), and a 
transition density function x i— > g{x,y), referred to as the likelihood, such that 
¥{YteA\Xt) = J^ g{Xt,y) X{dy) for all A G ^(Y). The kernel Q and the hkeli- 
hood function x t-^ g{x, y) are assumed to be known. We shall consider the case 

in which the observations have arbitrary but fixed values yo:T *== boi • • • > Vt]- 

Statistical inference in general state space models involves computing the pos- 
terior distribution of a batch of state variables Xg-s' conditioned on a batch of 
observations Y^t, which we denote by 4>s:s'\t:T (the dependence on the obser- 
vations Yf-T is implicit). The posterior distribution can be computed in closed 
form only in very specific cases, principally, when the state space model is linear 
and Gaussian or when the state space X is a finite set. In the vast majority 
of cases, nonlinearity or non-Gaussianity render analytic solutions intractable 

UMMM- 

These limitations have stimulated the interest in alternative strategies being 
able to handle more general state and measurement equations without putting 
strong a priori constraints on the behaviour of the posterior distributions. Among 
these, Sequential Monte Carlo (SMC) methods play a central role. SMC meth- 
ods refer to a class of algorithms for approximating a sequence of probability 
distributions over a sequence of probability spaces by updating recursively a set 
of random particles with associated nonnegative weights. These algorithms can 
be seen as a combination of the sequential importance sampling and sampling 
importance resampling methods introduced in [15[ and [25|, respectively. SMC 
methods have emerged as a key tool for approximating stateposterior distribu- 



tions in general state space models; see, for instance, Uj, l2ll . l22l . l24i | and the 
references therein. 

The recursive formulas generating the filtering distribution (j)T\0:T ^ind the 
joint smoothing distributions 4>o;T\0:T are closely related. Using the basic filter- 
ing version of the particle filter actually provides as a by-product an approxima- 
tion of the joint smoothing distribution in the sense that the particle paths and 
their associated weights can be considered as a weighted sample approximating 
(f)0:T\0:T- From these joint draws one may readily obtain fixed lag or fixed inter- 
val smoothed samples by simply extracting the required components from the 
sampled particle paths and retaining the same weights. This appealingly simple 
scheme can be used successfully for estimating the smoothing joint smoothing 
distribution for small values of T or any marginal smoothing distribution (j)s\0:T^ 
with s <T, when s and T are close; however, when T is large or when s and T 
are remote, the associated particle approximations are inaccurate [13] • 

In this article, we consider the forward filtering backward smoothing (FF- 
BSm) algorithm and the forward filtering backward simulation (FFBSi) sampler. 
These algorithms share some similarities with the forward-backward algorithm 
for discrete state-space HMM. The FFBSm algorithm consists in reweighting, 
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in a backward pass, the weighted sample approximating the filtering distribu- 



tion (see [18|, [16(], [13] )• The FFBSi sample, conditionally independently to the 
particles and the weights obtained in the forward path, realizations of the joint 
smoothing fixed interval smoothing distribution; see |l3l ]. 

The complexity of the FFBSm algorithm to estimate the marginal fixed inter- 
val smoothing distribution or of the original formulation of the FFBSi sampler 
grows generally as the square of the number of particles multiplied by the 
time horizon T. This complexity can be linear in N for some specific exam- 
ples. Otherwise, some tricky algorithms should be developed to overcome this 
problem, see for example [l9|. Note that these computational techniques lead to 
algorithms with complexity of order A^log(A^), but this reduction in complex- 
ity comes at the price of introducing some level of approximations (truncation) 
which in practice introduce some bias which might be difficult to control. In 
this paper, a modification of the original FFBSi algorithm is presented, having 
a complexity which grows linearly in N, without having to truncate the density 
or to use intricate data structures. 

The FFBSm and FFBSi algorithms are very challenging to analyze and, up 
to now, only a consistency result is available in [3] (the proof of this result 
being plagued by an error). The FFBSm estimate and the FFBSi trajectories 
explicitly depend upon all the particles and weights drawn before and after this 
time instant. It is therefore impossible to analyze directly the convergence of this 
approximation using the standard techniques developed to study the interacting 
particle approximations of the Feynman-Kac flows (see [H] or ) . 

The paper is organized as follows. In Section [21 the FFBSm algorithm and 
the FFBSi sampler are introduced. An exponential deviation inequality is first 
provided in Section [3] for the fixed-interval joint smoothing distribution. A Cen- 
tral Limit Theorem (CLT) for this quantity is then obtained in Section [H 
Time-uniform exponential bounds are then computed for the FFBSm marginal 
smoothing distribution estimator, under mixing conditions on the kernel Q, in 
Section [5l Finally, under the same mixing condition, an explicit bound for the 
variance of the marginal smoothing distribution estimator is derived in Section [6j 

Notations and Definitions. We denote a-m-.n — iflmi ■ ■ ■ ,an) and [ 
{am, . . . , an,bp, . . . ,bg). We assume that all random variables are defined on a 
common probability space {^},J-',¥). A state space X is said to be general if 
it is a Polish space and its topology is metrizable by some metric d such that 
(X, d) is a complete separable metric space. We denote by I3{X) the associated 
Borel cT-algebra and by Bh(X) the set of all bounded i3(X)/i3(M)-measurable 
functions from X to M. For any measure fj, on (X,;S(X)) and measurable function 
/ satisfying |/(x)| /i(dx) < oo we set fi{f) = /x f{x) fi{dx). Moreover, we say 
that two measures fi and v are proportional (written ^ oc z^) if they differ only 
by a normalization constant. 

Let X and Y be two general state spaces. A kernel V from (X, B(X)) to 
(Y,S(Y)) is a map from X x ;B(Y) into [0,1] such that, for each A G B{Y), 
X V{x, A) is a nonnegative bounded measurable function on X and, for each 
X G X, j4 I— > V{x,A) is a measure on B{Y). The function V{-,f) belongs to 
]B(X) and we sometimes use the abridged notation Vf instead of V{-,f). For a 
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measure v on (X, S(X)), we denote by vY the measure on (Y,i3(Y)) defined by, 
for any A G i3(Y), vV{A) = /x V{x, A) i/(dx). 

For simplicity, we consider a /u//?/ dominated state space models for which 
there exists a cj-finite measure v on (X,B{X)) such that, for all x € X, Q{x,-) 
has a transition probability density q{x, •) with respect to u. For notational 
simplicity, i/(dx) is sometimes replaced by dx. 

For any initial distribution x on X and any t < s < s' <T, denote by (j^^^^^s-.s'lt-.T 
the posterior distribution of the state vector Xg.g' given the observations Yt:T 
and knowing that Xo ~ x- all A E B{X)^^^ -s+i) ^ ^j-^jg distribution may be 
expressed as 

/ • • • / '/'x,t|t(da:t) nLf+i gu-l{Xu-l) Q{Xu-l,dXu)gT{xT)iA{Xs:s') 
I (f^xM^^i") nLt+1 gu-l{Xu^l) Q{Xu~l,dXu)9T{xT) 

with the convention nit=s = 1 if s > i- For simplicity, we will use the shorthand 
notations: 

^X,s|i:T = 4>x,s:s\t:T ; 

^X,s:s'\T = 'Px,s:s'\0:T ) (1) 
^X,s\T = </'x,s:«|0:T • 

In fully dominated case, the smoothing distributions 4>^^,s■s'\t■.T have densities 
(which we will denote similarly) with respect to the product measure v®^^ -s+i)^ 

2. Algorithms. Conditionally on the observations Yq-t, the state sequence 
{Xs}s>o is a time-inhomogeneous Markov chain. This property remains true in 
the time-reversed direction, i.e. given a strictly positive index T, initial distri- 
bution Xj and index s G {0, . . . , T — 1}, for any / S Bb(X), 

[f{Xs) I Xs+1:T, Yo:t] = ^x ifi^s) \ Xs+1,Ys:t] = B^,s{Xs+l, f) . 

where li^^sixs+i, ■) is the backward kernel. In the fully dominated case, this kernel 
may be expressed as 

{x)q{x,Xs 

Using these notations, for any integers T > 0, index s G {0, . . . , T — 1} and initial 
probability x-, the joint smoothing distribution may, for all / G Bb(X"^~*^^), be 
recursively expressed as 



^xA-s+i,A) = [ ^x.l^(^)g(^^^^+i) i^(,)dx , (2) 
J A . (I)y.s\s[x')q(x', Xs+i) dx' 



^X,s:T\Tif)=^x[fi^s:T)\Yo:T] 

= J "' J fi^s-.r) By.^s{Xs+l, dXs) 4>x,s+l:T\T{dXs+l:T) , (3) 

with (p^^T:T\T = 4'x,T\T being the filtering distribution at time T. If / depends 
on the first component Xg only, then Eq. ([3]) yields the marginal smoothing 
distribution, which is defined recursively by: 

<PxMTif) = J J fi^s) B^,5(xs+i, dxs) (l)x,s+i\T{dxs+i) . (4) 
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The method proposed by [12l . lid ] consists in approximating the smoothing dis- 
tribution by storing the particles and associated weights obtained in a forward 
filtering pass and revising the weights in a backward smoothing pass. In the 
forward pass, particle approximations of the filtering distributions 4>x,s\s are 
computed recursively for s = 0, . . . , T. Each approximation is formed by a set 
of particles and associated importance weights {ujI}^i according to 

N 
i=l 

where 0^ = denotes the Dirac mass located at x. There are sev- 

eral ways of producing such weighted samples {{Cs-,^l)}iLi'-i see U], j^H, and 
the references therein. Most of these algorithms can be recasted into the common 
unifying framework of the auxiliary particle filter. Let {^^}^;^ be i.i.d. random 
variables such that ~ pQ and set loq = j^{Co)9oi^o)- By classical importance 
sampling, the weighted sample {{Coj^o)}iLi targets the distribution <?!';^,o|o- As- 
sume now that the weighted sample {{Cs-i,^l-i)}iLi targets 0^ i.e. for 
h G Bb(X), ^^7-1 Z^i^i '^s-i^(^s-i) is an estimate of / (j)x,s~i\s~i{'ix)hix). We 
may approximate 4'x,s\s by replacing (l)x^s-i\s-i the forward filtering recursion 



(l^x.sUf) OC J (t>x,s~l\s~l{'iXs-l)q{Xs~l,Xs)gs{Xs)f{Xs)<iXs (6) 

by its particle approximation (p^^s-ils-i^ leading to the target distribution 

TV 

^x"|.(d^) ^J2<-MC-i,^)9six)dx . (7) 

i=l 

To avoid an 0{N'^) algorithm, [2^ introduces an auxiliary variable corresponding 
to the selected particle index and target instead the probability density 

oc ujl_^q{es-i,xs)gs{xs) (8) 

on the product space {1, . . . , x X. Since is the marginal distribution of 

0t"^= with respect to the particle index, we may sample from (i>*^^^ by simulating 
instead a set {(-^s, of indices and particle positions from an instrumental 

distribution having probability density 

7r,|,(i, Xs) oc Xs) , (9) 

where {'d s{Cs-i)}f=i a'^^ so-called adjustment multiplier weights and ps is the 
proposal transition density function. Each draw (/],^*) is assigned to the weight 

i dcf (l{il-i,il)gs{il) 

'^s — 7i— — —Ti T ' y^^i 

m/-i)ps{i[-i^ii) 

which is proportional to 4'^^\g{Il-,Cs) /'^s\s{IliCs)- Hereafter, the indices are dis- 
carded and {(Cs) is taken as an approximation of the target distribution 
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^^s\s- The simplest choice, yielding to the so-called bootstrap particle filter al- 



gorithm proposed by [ij], consists in setting, for all x £ X, 'i?s(x) = 1 and 
Ps{x,-) = q{x,-). A more appealing choice from a theoretical standpoint -but 
often computationally costly- consists in setting '&*{x) = J q{x, Xs)gs{xs) dxg 
and 

X _ q{x,Xs)gs{xs) 

In this case, the importance weights {ujI}^^ are all unity and the auxiliary 
particle filter is said to be fully adapted. Sampling from the fully adapted version 
of the auxiliary particle filter is in general difficult; the general method, based 
on the auxiliary accept-reject principle, proposed by [3] and [i^] for sampling 
from these distributions is, with few exceptions, computationally involved. Other 
choices are discussed in [ly] and 

2.1. The Forward Filtering Backward Smoothing algorithm. Following [l^ . 
the smoothing distribution can be approximated by filtering passes in the for- 
ward as well as the backward directions. Firstly, the particle filter is executed, 
while storing the weighted sample {{CtT^t)}iLi^ ^ 1^ t < T; secondly, starting 
with the particle approximation of the filtering distribution at time T, the impor- 
tance weights are recursively updated backwards in time by combining particle 
estimates of the fixed interval smoothing distribution 4>-)^^s+1:T\t ai^d the filtering 

distribution estimate (j)^^s\s- For 1 < s < t < T, define {^1% . . . An 

approximation 

B,,.(x.+i,dx.) = f: ^i^-'ffj^'^ s^iidxs) (11) 

of the backward kernel can be obtained by revising the weig hts {wUili with- 
out moving the particles {Cs}iLi- If in addition the joint smoothing distri- 
bution (p^^g^i-TlT is approximated at time s -|- 1 using the weighted sample 

{(d;Y:T ,^f;Y|T)}' js+l:T ^ {h ■ ■ ■ , N}^~^ , i.e. 

N 

(i>X,s+l:T\T{<iXs+l:T) CC ^ tJ^^^+^i^ 5 ^ (dx^+i:T) , (12) 

1 is+l:T 



then we may substitute this and the approximation (llip of the backward kernel 
into ([3]) to obtain 

N 

<i>X,s:T\T{<iXs:T) ^ Ujl''^^6^j,,T{dXs:T) , 

js:T = l 

where the new weight is recursively updated according to 
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The estimator of the joint smoothing distribution may be rewritten as 

is-T 

W g.'rp LOrp 



X.:TlTih) E (14) 



where 

t , iu-i 



s-.t 



def TT <-'i^g(C-l.e) 
u=s+l J2i=l^i~lQ{Ci.~l, ^u") 



^.i ,H ^ ' (15) 



with the convention Ha = 1 if a > ^ so that ro^f^^ = 1. Since X^j^.^.i '^i".t 



titW may be alternatively expressed as 



A'' iT 



Ks:Mh) = E Zf ^:^) ■ (16) 



This estimate of the joint smoothing distribution may be understood as an ap- 
proximate importance sampling estimator. The estimator 4'x,s:T\Ti^) is highly 
impractical, because its support is the set of A^^~*+^ possible particle paths 
{^I'^t}- Nevertheless, this estimator plays a key role in the theoretical deriva- 
tions. 

The importance weight of these path particles is computed as if the path 
particle ^^f-j?^ were simulated by drawing forward in time, for s < t < T, S^f* in 
the set {^,1}^^, conditionally independently from {^I'^Si} from the distribution 

N 

^t\T.^li<l{ilir) , i = l,...,iV, (17) 

1=1 

which approximates the predictive distribution (j)x,t\t-i- Of course, the distri- 
bution of is not exactly the product of the marginal distribution ()17p . 
because the particle position are not independent -this approximation would 
be approximately correct for a finite block of particles selected randomly, using 
propagation of chaos property; see e.g. [H, chapter 8] -. This is why standard 
results on importance sampling estimators cannot be applied to that context. 

Most often, it is not required to compute the joint smoothing definition but 
rather the marginal smoothing distribution 4'x,s\T (or more generally some fixed 
dimensional marginal of the joint smoothing, 4'x,s:s+A\T for a positive integer 
A). Approximations of the marginal smoothing distributions may be obtained 
by associating to the set of particle {i^^^}, js G {!> • • • ,N}^^^ the weights ob- 
tained by marginalizing the joint smoothing weights over the components 

js+i:T e {1, . . .,N}^-'-^+\ J;^^ = EJ!+i^t=i ^s|t • It is easily seen that these 
marginal weights can be recursively updated as follows: 



-l\T = T. ^N TuTIj y s^ilT^ ^ = 1,...,N (18) 



The complexity of this estimator of the marginal smoothing distribution is 
0{N'^T), which is manageable only if the number of particles is moderate. When 



imsart-aos ver. 2007/12/10 file: dgarm.tex date: April 2, 2009 



8 



DOUG ET AL. 



matrix {Ajj^^i, P ( = j | J^t V a{Js+i)) = Bj^j, with 

A. = ^,J = h...,N . (19) 



the dimension of the input space is not too large, this computational cost can 
considerably reduced to A^log(A^), but at the price of truncating the distribu- 
tion and therefore introducing some amount of bias (see for example [l^). Note 
that, in certain specific scenarios (such as discrete Markov chains over large 
state space with sparse transition matrix), the complexity can even be reduced 
to 0{NT). 

2.2. The Forward Filtering Backward Simulation. Another way of under- 
standing (jl4p consists in noting that the importance weight (jl3p is a probability 
distribution over {!,..., N}'^~^; more precisely, uil'^^^ = IP ( Js+i-.T = js+i-.T \ ^t), 
where J^^ = f7{(^|,a;9; < t < s, 1 < i < iV} and {Ju}l=s 

is a reversed Markov 

chain with a final distribution uJj,/Qt, i = 1,---,N and backward transition 

" . -IN 

^^g(c^,e+i) 
E^=i^."(?(e,e+i: 

With these definitions, the joint smoothing distribution may be written as the 
conditional expectation 

4>^,s:T\T{h)=^[h[ci^^)\j^T] . (20) 

The idea of simulating the indices Js.t backward in time to draw approximately 
from the smoothing distribution ip^^s-.TlTi has been proposed in [ij] (Algorithm 
1, pp. 158). This algorithm proceeds recursively backward in time as follows. At 
time r, we draw conditionally independently from an N indices {jf}^]^ from 
the distribution {tOj-i}^^-^ (for ease of notations, we draw the same number of 
particles in the forward and backward passes, but there is no need to do that). 
Given now a N sample {Jg_^_i.rp}^^i S {1, . . . , N}'^~''', we draw conditionally in- 

dependently € {1, . . . , N}, i = 1, . . . , N from the distributions ^Bj ^• 

This algorithm is referred in the sequel to as the forward filtering backward 
simulation algorithm (FFBSi). This sample yields to the following (practical) 
estimator of the joint fixed-interval smoothing distribution: 

hs-.Mh) = iV~'E^ > h G Bb(X^-^+i) . (21) 

The computational complexity for each individual realization is 0{N) at each 
time step, so the overall computational effort to estimate 4>-)^fl;T\T is therefore 



0{N'^T). Using the methods introduced by 19(], this complexity can be further 
reduced to 0(A^log(A^)T), but here again at the price of some additional approx- 
imations. It is easy to modify this algorithm to make it linear in A^. Assume that 
the transition kernel q is bounded, q{x,x') < \q\^- Since w^g(^^, ^*+i) < \q\^^i, 
for any i, j G {1, . . . , N}, we may sample ()19p using the accept-reject mechanism. 
For any £ = 1, . . . ,N, we sample independently indices I^'", u = 1,2,... from 
the distribution {rij^a;^}^^ and uniform random variables ?7g'" on [0, 1] and let 

Jg = Is " where is the first index u for which < q{S,/ , C^+i )/ l^loo- '^^'^ 
complexity of the resulting algorithm is linear instead of quadratic in N. 
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3. Exponential deviation inequality. In this section, we establish the 
properties of the forward filtering backward smoothing algorithm. We first es- 
tablish a non asymptotic deviation inequality. For any function / : X*^ — > M, we 
define_|/|^ = sup^^x-^ and osc(/) = sup(^.^^,,)gx'*xxd 1/(3^) - fix')\. De- 

note N = NU {oo} and consider the following assumptions. We denote by T the 
horizon, which can be either a finite integer or infinite. 

A 1. SUPo<j<T btloo < 

Define for t > the importance weight functions: 



UJoix) 



dx 



{x)go{x) and ujt{x,x' 



, def q{x,x')gt{x') 



,t>l . 



(22) 



dpo^ ' " ' / ■dt{x)pt{x,x') 

A 2. supo<f<T Wt\oo < aJ^d supo<f<T \uJt\oo < oo- 

The latter assumption is rather mild. It holds in particular under (A[T|) for the 
bootstrap filter {q = pt and = 1). It automatically holds in the fully adapted 
case {iOt = !)■ 

The first step in our proof consists in obtaining an exponential deviation 
inequality of Hoeffding type for the auxiliary particle approximations of the 
forward filtering distribution <t>x,t\t- Such results can be adapted from Chapter 
7], using the Feynman-Kac representation of the auxiliary filter. For the sake of 
completeness, we prove these results explicitly. By convention, we set c/)^ o|-i = X 
and t?o = 1- 

Proposition 1. Assume that 42H11 Then, for all t £ {0, . . . ,T} , there exist 
< B, C < oo such that for all N , e > 0, and all measurable functions h, 



N 



N-^Y.^lh{il) 



(gth) 



1=1 



> e 



> e 



(23) 
(24) 



where the weighted sample {{Q,ojI)}^i is defined in (fTO 



Proof. We prove (j23p and (j24p together by induction on t > 0. First note 

that, by construction, {(Ct > )}i<j<A^ are i.i.d. conditionally to the cr-field Tt-i '= 
a{{Cg,ujl);0 < s < t - 1,1 < i < N}. Under (Al2]), we may therefore apply the 
Hoeffding inequality, which implies, 



N 



N-'Y.^lh{^D-^ 



i=l 



N 



N-'Y.^lh{il) 



1=1 



> e 



(25) 



For t = 0, 



E 



N 



N-'Y.^lh{il) 



i=l 



t-l 



E 



xigoh) = (/>x,o|- 1(50/1) . 
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Thus, (IMl) follows by Lemma [U] applied with un = Y^iLi^oHCo)^ = 
^"^Eili'^O' CAT = and 6 = /3 = xioo), conditions JJ), ^ and dTTH) being 
obviously satisfied. For t > 1, we prove (123p by deriving an exponential inequality 



for E 



from the definition that 

N 
i=l 



t-1 



thanks to the induction assumption. It follows 



E 



t-i 



E 



1=1 



l{Q-i,x)gt{x) 



h{x)dx 



j:r=i^UlQiCt-i,dx)gt{x)h{x) 



(26) 



ElM-iuet-i) 

We apply Lemma [TT] by successively checking conditions (jl]), (jll]) and (jllip with 





dof 






dcf 




< 

CAT 


dcf 






dcf 


r dcf ^ dcf 

b = p = 



We have that 



hN 

CN 



E 



dN 



'x,t-i|t-i 

3 l/^loo ■ 



M 



< \^tL \h\ 
Q{-,x)gt{x) 



M-)Pt{-,x) 



pt{-,x)h{x) dx 



Thus, condition ^ is satisfied. Now, assume that the induction assumption ([2 
holds where t is replaced by t — 1. Then, 



OAT 



dN ^ 



N 



With H{^U) J Q{CU,dx)gt{x)h{x) - '^"'-^''-;^;^_^,^_^(,^) 
by noting that (l)^^t-i\t-i{H) = 0, exponential inequalities for un — {cn /dN)bN 
and bN — b are then directly derived from the induction assumption under (A[T]l2]). 
Thus Lemma [11] applies and finally (j23p is proved for t > 1. 

To conclude, it remains to see why (j23p implies ()24p . Without loss of generality, 
we assume that = holds. An exponential inequality for J2iLi ^t^Ht) 

is obtained by applying Lemma 1 with 



[/ Q{-Ax)gt{x)h{x)\ , j , 
———737) '!^H?t-iJ- 



And 



OAT 
^AT 



dcf 



dcf .^Af 
CAT = 



def 
def 



a def 0x,t-i|*-i[/ 



Px,i_l|t-i(i?t) 
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where conditions ([I]), ([Tl]), pil|) are obviously derived from the (123p since the 
condition <p^^t\t{^) = directly implies that 



5x,i-llt"l 



Q{-,dx)gt{x)h{x) 



□ 



Using the exponential deviation inequality for the auxiliary particle approx- 
imation of the filtering density, it is now possible to derive an exponential in- 
equality for the forward filtering backward smoothing approximation of the joint 
smoothing distribution. 

Theorem 2. Assume 42H1 Let 1 < s < T. There exist < B, C < oo 
such that for all N , e > 0, and all measurable functions h, 



4>x,s:T\Tih) - 4'x,s:T\Tih) > £ 
4>X,s:T\T{h) - (t)x,s:T\T{h) > £ 

where 4'x,s:T\t{^) 0.''^'^ 4'x,s:T\T{h) o-re defined in (I14p and (I2ip . 

Proof. Using (pO|) and the definition of (t>x,s:T\T{h) , we may write 



(27) 
(28) 



N 



,,s:T\Ah)-<P^,s:T\Tih)=N-^J2 



J' 



h r-- 



E 



T 



which implies (I28p by the Hoeffding inequality and ()27p . We now prove (I27p . Let 
s < t < T. For h a measurable function defined on X^"^"*"^, define the kernels 

Ls,T,T{^tf:T , h) = /i(e^ff ) and, for s < t < T, 



dcf 



QiCt\(ixt+i)gt+iixt+i)x 



r-i 



Y[ Q{Xu,dXu+l)gu+l{Xu+l)h{[Cs-'t\xt+l:T]) ■ (29) 
u=t+l 

By construction, Ls^t,T can be obtained recursively backwards in time as follows: 



L.,t-i,T(cri>/^)= / Q(Cl^dx)5^(^)w([c^l^^]>/^) 



(30) 



Denote by 1 the function identically equal to one. Under (A[T]), \Ls^t^np{-,h)\^ < 
\Ls,uT{-,'i-)\^\h\^, and |L^_t,T(-, 1)L < ^- Denote 



dcf 



and for all t G {s -|- 1, . . . , T}, set 



(31) 



N 



ia:t-l=l 



Ls,t,Tmtl,^]M, (32) 
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where wl'.^Zi is defined in ([TS]) . Define, for t G {s, . . . , T}, 



N 

dcf 



^,i,T(/i) =E^*Xt,Tfe',/i). (33) 

£=1 

Without loss of generahty, we assume that (f>x,s:T\Tih) = 0. With the notations 
introduced above, 4'x,s:T\t{^) may be expressed as the sum 

We now compute an exponential bound for the terms appearing the RHS of the 
previous identity. Note that 

AsMh) _ n-^ElML,,s,T{^ih) 

We apply Lemma [TT] by successively checking conditions (jl]), ([II]) and (jllip with 
= 07^E^l^^f^s,s,T(Cf,^), = r^7^E£ll'^s-^s,s,T(Cf,l), Cat = 0, and b = 
[5 = 4>^^g\g{Ls^s,T{-i'^))- It follows immediately from the definition that ^ < 
\h\^. Moreover, (py-^s-.TirW = implies that (j)^g^g{Ls^s,T{-,h)) = 0, conditions 
(jlH) and (jllip are then directly derived from the exponential inequality for the 
auxiliary filter (see Proposition [H Eq. (p^ ). 
We now establish an exponential inequality for 

As,t,T{h)/As,t,T{l) - ^,t_l,T(/i)Ms,i-l,T(l) 

using again Lemma [TTl We take = N^^As^t,T{h), b]\f = N^^As^t,T{'i-), cn = 
N-^As,t-i,T{h), dN = iV-M,,t_i,T(l) and 

^^p^ <t>x,t-i\t-i{Ls,t-.A-,l)) _ (35) 

9x,t-i\t~i\yt) 



By definition, \aN /h^l < \h\^ and \cN/dN\ < \h\^, showing Lemma \TT\ condi- 
tion ([I]). We now check condition ([II]). The function ^1".^ i— > Lg^t^xiCs^t ^ 1) depends 
only on ^j*; with a slight abuse of notation, we set Ls^t,T{Csd\ 1) = Lt^t,T{Ct ■, 
It follows from the definition ([52]) that ^^^(^^(C)!) = -^i,i,T(C) !)■ Plugging this 
into the definition of As^t,T{'^) yields: As^t,T{^) = Y^f=i '^t-^t,t,TiQ, !)• Condition 
(jll|) follows from Proposition [l]-Eq. ([23]). Finally, we check condition Ollj) . Write 
- = I^ii ^tG'^,t,T(ef, /i) where 

G.,,T(e, M = F.at(^ , h) - ^:'-''^\^\ Fs,t,Ti^, 1) . (36) 

As,t-l,T[i-) 

Since {(Ct ) "^t )}£Li are i.i.d. conditionally to the cr-field Tt-i, we have that 
{u!fGs,t,T{£,i, f^)}eLi are also i.i.d. conditionally to Tt~-\- That allows to apply 
the conditional Hoeffding's inequality to Y^^=\ ^iGs,t,TiS,i^ ^) provided that 
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we have first checked that E [r]^ \ J-'t~i] = and that {uj^Gs^t^Tiiu h))i<i<N are 
bounded random variables. By (j26p . for any bounded function /, 



E 



Applying this relation with /(•) = Fs^t^T{-,h) and using the recursion ([30]) . the 
previous relation implies by direct calculation 



E 



t-i 



(37) 



Therefore, since As^t-i,T{h) / As^t-i.ri^) is J^t-i-measurable, 



E 



(38) 



by definition ([33]) . Moreover, since \Fs^t,T{x^h)\ < Fs^t,T{x,l) \h\^, we have 
\As,t,T{h)/As^t,T{'i-)\ < \h\^, showing that 



\u;t{x,x')Gs,t,Tix',h)\ 



Mx,x') (^F,M^',h) - l;f|'^{t| ^M.T(xM)) 

< 2 \u;t\^ \Fs,t,T{; 1)L I^L < 2 I^^L \Lt,t,T{-, l)loo I^L < . (39) 
The Hoeffding inequality therefore implies: 

TV 



CN , 
dN 



> e 



> e 



1=1 

< S exp <( -GN 



showing condition piip and concluding the proof. 



□ 



4. Asymptotic normality. We now derive a Central Limit Theorem (CLT) 
for the forward- filtering backward-smoothing estimator (jl4p . Consider the fol- 
lowing assumption. 

A 3. for all i G {1, . . . ,T} and Af > 0, / sup|^.|<^/pt(x, x')dx' < 00. 

We first recall that, under assumption (A[l]l2]) the auxiliary particle filter ap- 
proximation of the filtering distribution satisfies a CLT (see for example |ld . 
Theorem 3.2]). For any bounded measurable function /i : X — > M, define the 
kernel 

Ls,tix,h) = Ls,s,t{x,~h) (40) 
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where h{xs:t) = h{xs). The quantity Ls,f(^, 1) may be interpreted as 
the conditional expectation of h[Xt) given the observations up to time t and Xg 
evaluated at Xg = i- Moreover, for any distribution z/, 

j v{dxs)Ls^t{xs,h) 



^u,t s:t 



(h) 



J i'{dxs)Ls,t{xs, 1) 



(41) 



Proposition 3. Assume ^4[IHll Then, for all bounded measurable functions 
h:X^R, andO < s <T, 



where 



with 



s 

r=0 

po {u;i{.)LlJ;h) 



(0X,O|-1 bo(-)-^0,s(-,l)] 



2 ' 



(42) 
(43) 

(44) 



and for 1 < r < s, 

'^X,r—l\T—l 



'&r{-) I Pr{-,x)u^{-,x)LjAx,h)dx (j)^. 



r— l|r~l 



{4>x,r\r-l [gr{-)Lr,s{-, 1)]) 



(45) 



Using the CLT for the auxiliary particle approximation of the filtering distri- 
bution, we establish a CLT for the auxiliary particle approximation of the fixed 
interval joint smoothing distribution. The proof is established using this time a 
recursion going forward in time, i.e. a CLT for (t>x,s:t\t{') is deduced from a CLT 
for ^j/,s:t-i|f-i(-)- The proof is based on the techniques developed in (extend- 
ing and [20(]) which are tailored to the analysis of sequential Monte-Carlo 
algorithms. Define, for < s < t < T, qg^t-iixs-t-i^x) = (l{xt~i,x) 



Vs,t,T{x,h)'^= ^t{x) / Pt{x,x')uJt{x,x')gl,T{x' ,h)(lx' , 



where 



gs,t,T{x, h) 



def (t>x,s-t-Mt~l{qs:t~l{-,x)Ls,t,T{[-,x\, k)) 



(46) 



(47) 



(I)y,^t-I\t~l{(l{-,X)) 

Theorem 4. Assume J^^^ Let s < T. Then, for all bounded measurable 
functions h : X^~^+^ R, 



{^X,s:T\T{h) - (Px,s:T\T{h) 

with 

def ^xMs [Ls,s,t{-, h 



/i-<^x,-t|t(M]) , (48) 



X,s:T\T 



[h] 



^ J2 '^X,t-l|t-iKf.T(-»)'/'X,t-l|f-i(^t) ^^g^ 

t=s+l 
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where Tx,s\s ^-^ defined in (j42]) . Moreover, 



Xv 

V 



N [<Px,s:T\T{h) - <Px,s:T\T{h) 



+ r. 



X,s.T\T 



/i-</'x,-T|T(/i)]) , (50) 



Proof. Without loss of generality, we assume that 4>x,s:T\T{h) = 0. Denote by 
(•, •) the scalar product, V^^j, = [1/^ . . . , F^^^^] and W^j, = [W^s,T^ • • • > W^t,t] 
the vectors given by 



(51) 



A.. 



,t-i,r 



(h) 



.s,T[ 



^M-i,r(l) 

N 

^M,t(1) 



As,t,T{l)j ,t = s + l,...,T , (52) 
,t = s + l,...,T . (53) 



where ^<i,t,T is defined in ([33|) . Using these notations, we decompose \/]V ^^^s:T\t{^) 
as follows 



(54) 



iV'Ax,s:T|T(/i) = (K,^:T(M,W^. 



T/ ' 



Since VK/^ = ^(^^ ,,|s(L5^s^t(-, 1))^ and similarly for t = s + 1, T 



{N-^Y.f=i^tLt,t.T{iu'^)y\ Proposition dl ([231) and ([MD show that 



1^. 



N P , 



1 



'^s,i,T >Ar-*oo 



<Px,s\s{Ls,s,t{-,'^)) 

0X,f_l|t-l(^^f) 
4>x,t\t-l{9t{-)Lt,t,T{-, 1)) 



(55) 

(56) 



Therefore (j48p follows from the application of the Slutsky Lemma provided that 
we establish a multivariate CLT for the sequence of random vectors Vjj^{h). For 
that purpose, we show that for any t G {s, . . . ,T} and any scalars ag, ■ ■ ■ ,at, 



V 



(57) 



r=l 



where [h] = F^^^i^ [-^^s,s,t(-, h)], and for r G {s + 1, . . .,T}, 



a. 



X,s.r\T 



def 4'x,s:r-l\r-lbjs,r,Ti-,h)] 
4>x,r-l\r-l{'&r) 



(58) 



First consider t = s. Since As^s,Tih) = Y^eLi ^iLs,s,T{ii, h) and 4>^^s\s{Ls,s,t{-, h)) 
0, Proposition [3] shows that A^^/^r27^A5^s^T(/i) is asymptotically normal with 
zero mean and variance Tjs^s,T{h)- Assume now that the property holds for some 
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t—1 > s. We apply the results on triangular array of dependent random variables 
developed in [91]; Vg^^rp[h) may be expressed as 



N 



(59) 



where Gs,t,T is defined in (f36]) . Eq. ([381) shows that E [?7m/ I -^t-i] = 0, 
1, . . . , A^, which implies: 



E 



t-l 



t-1 



t-l 



converges m 



The induction assumption shows that E J2r=s ^rVji- xi^) 
distribution to a centered Gaussian distribution with covariance J^l—i o^^s,r,T{h)- 
We will now prove that 



E 



exp (inVW,^r(/i) 



:Ft-i 



>N-^oo exp 



-^s,r,T{h) 



By 0, Corollary 11], setting it remains to show that 



TV 

1=1 

N 



Ml 



UM,e^{\UM,t\>e} 



mi-i 



, for any e > , 



(60) 
(61) 



where, for £ e {1, ... , N}, Gn/ = ^t-i V a{uj{,^{,j <i).We first prove §0i). It 
follows from the definitions that 



AT 



m,e-i 



E 



1 VU^^MiU) 



u;tiCLi,x)Gs,t,Tix,h)) dx 



1 



N 



where 



^s,t,T{x,h) =^i?j(x) j pt{x, x')ujt {x, x')G1^ t{x' , h)dx' . 
We will show, by applying Lemma [T3l that 

1 ^ p 

E ^t-i'^s,t,Ti^f-i, h) — > <^x,t-ilt-i (^«,t,T(-, > 

^=1 

where fg^t^r is defined in (j46p . To that purpose, we need to show that (i) 
there exists a constant Voo such that \T s,t,T{--, h)\^< Voo P-a.s. \vs,t,T{-, h)\^ < 



(62) 
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P 



Uoo, and (ii) for any constant M > 0, sup|^|<^^ |Ts jT(rE,/i) — Vs^t,T{x,h)\ 
0. Eq. (IMD shows that \uJt{-)Gs,t,T{-, h)\^ <~2\uot\^\Lt,t,T{-,l)\^\h\^, which 
imphes that 

|T,,j,T(-,/i)L<2|^9tLklLl^tAT(-,l)lLl^lL , ■ (63) 

Similarly, the bound \gs,t,T\oo - \^t,t,Ti-, \h\^ implies that 

\vsM-^ h)L< l^tloo \^tt I^M,t(-, 1)|L \h\lo ■ (64) 
The bounds (j63p and (|64p imply (i). To prove (ii), first note that for all M > 0, 

sup \Ts^t,T{x,h) - Vs,t,T{x,h)\ 

\x\<M 

< l^^tloo I'^tloo / sup pt{x,x') Gl,Ax',h)-gl, Ax',h) dx' . (65) 



\x\<M 



< 



Under (Al3]), / sup|2,|<j,/pt(3;, x')dx' < oo, and since Gl ^ rp{x' , h) — g1t^'x{x' , h) 
2 \Lt^t,T{-, 1)1^ l^lL' Lemma [12] shows that 

/ sup pt{x, x') G^i-^j-ix', h) - glt,Tix', h) dx' ^ , 

•I \x\<M 

p 

provided that we can show that, for any given x e X, Gs,t,T{x, h) — > gs,t,T{x, h). 
The definitions ()32p of the function -Fs,t,T and (I16p of the smoothing distribution 
implies that 

\s:t-l\t-l[Qs,t-l{-,x)Ls,t,T{[-,x],h)] 



Fs^t,Tix,h) 



4>x,t-i\t-i{qi-,x)) 



Theorem [2] show that Fs^t,Tix, h) — > gs,t,T{x, h). On the other hand, it follows 
from the definitions that As,t-i,T{h) = Q.t-i(i)^^s:t\t{Ls,t,T{-,h)). 

Since (j).^^s:t\t{Ls,t,T{-, h)) = 0, this decomposition implies that As^t-i,T{h) — > 0. 

Therefore, Fs,t,T{x-,h) gs,t,T{x,h). 

It remains to check the tightness condition (j6ip . This property is straightfor- 
ward since \UM,i\ < N~^/'^ |a;t|^ \h\^. 

We now prove (|50p . Using (j20p and the definition of (I)^^s:T\t{^)i 'we may write 



N 



N (4>x,s:T\T{h) - (t>x,s:T\T{h)) = iV"^/' ^ 



h ( r^^^ 



E 



Me. 



5:T 



T 



+ 



('^X,^:T|t(^) - <Px,s:T\T{h) 



Note that since {Jg.j^)i<e<N are iid conditional to J^t, (|50]) follows from (jl8]) and 
direct application of Corollary 11 in by noting that 



TV 



h ( & 



E 



T 



P ,2 



^ - 4>x,s:T\T{h) 



□ 
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5. Time-uniform deviation inequality. In this section, we study the 
long-term behavior of the marginal fixed-interval smoothing distribution esti- 
mator. For that purpose, it is required to impose a form of mixing condition on 
the Markov transition kernel. For simplicity, we consider conditions which are 
similar to the ones used in B chapter 7.4] or 0, chapter 4]; these conditions 
can be relaxed, but at the expense of many technical problems. This condition 
requires that the transition kernel is strongly mixing in the sense that 

A 4. There exist two constants < (T_ < cr+ < oo, such that, for any (x,x') £ 
X X X, 

CT- < q{x,x') < (7+ . (66) 

In addition, there exists a constant c_ > such that, / x{dxo)go{xQ) > c_ and 
for all t > 1, 

inf / q{x,x')gt{x')dx' > c_ > . (67) 
xex J 

Note that assumption [5] implies that z^(X) < co; in the sequel, we will con- 
sider without loss of generality that i^(X) = 1. Moreover, the average number 
of simulations required in the accept-reject mechanism per sample of the FFBSi 
algorithm is bounded by a~^/a~ . An important consequence of the uniform er- 
godicity condition is the forgetting of either the initial or the final conditions. 
The following Proposition extends some of the results obtained initially in 
and later extended in 0] (see also [B, Chapter 7] and [3, Chapter 2]). Define 

4,t(xs,xt) q{xs,Xs+i)gs+i{xs+i) 

t-i 

X n Q(,Xu,Xu+l)gu+l{Xu+l)dXs+l:t-l (68) 
u=s+l 

for s < i, and lt,t{xsiXt) *== 5xs{xt)-, so that 



LsA-s,h)=j^sA-s,X,)KxMxt. 



Proposition 5. Assume (J^^- Then, for all distributions X; x' and for all 
s < t and any bounded measurable functions h, 

JJ x{dxs)is,t{^s, xt)h{xt)dxt JJ x'{dxs)£s^tixs, xt)h{xt)dxt 



JJ xidxs)£s,ti^s,xt)dxt JJ x'i'ixs)is,tixs,xt)dxt 

< osc (h) , (69) 

where p *== 1 — cr_/cr+. In addition, for any bounded non-negative measurable 
functions f and f, 

JJ xidxs)h{xs)is,t{^s, xt) f ixt)dxt JJ x{dxs)h{xs)ls,t{xs,xt)f'{xt)dxt 



JJ x{dxs)^s,t{xs, xt)f{xt)dxt JJ x{'ixs)is,t{^s, xt)f' {xt)dxt 

<p*-'osc(/i) , (70) 

as soon as the denominators are non-zero. 
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Proof. The first statement is well-known; see for example and 0, Propo- 
sition 4.3.23]. To check the second statement, denote by B^^s,t the backward 
smoothing kernel, defined for s < t and for any real-valued, measurable function 

on by 

^Xs^X H'{Xt_i,Xt)\Ys:t] = J 'll'{xt-l,Xt)(l)^^s:t\s:tixt)B^^s,t{^t,dXt-l) , 

where (py(^^s:t-i\s:t-i is defined in ([T]). This kernel is absolutely continuous with 
respect to the dominating measure A and its density is given by 

^ ^ N _ 'Px,S-t-l\s:t-l{xt-l)q{xt-l,Xt) 

^'"'^ *' ! 4>x,s:t-l\s:t-l{x)q{x,Xt)dx 

Under (A[3|), this transition density is lower bounded by 

b-^^sAXt,xt-^i) > — — = — • 

J 4>x,s:t-l\s:t-l{x)cr+dx (7+ 

Since osc {B^^s^ti'i h)) < pose (/i), it follows that 

osc {B^,s,t ■ ■ ■ B^,s,s+ii-, h)) < p^~' osc (h) . (71) 

Note that 

// x{dxs)h{xs)£s,t{xs,xt)f{xt)dxt JJ x{dxs)h{xs)is,t{xs,xt)f'{xt)dxt 



JJ X{dxs)is,ti^s, xt)fixt)dxt JJ x{dxs)isA Xg 5 Xf )f'{xt)dxt 

'Px,t\s-tiBx,s,t ■ ■ ■ B^^s,s+l{-, h)f{-)) (t)^^t\s:t{Bx,s,t ■ ■ ■ B^^s,s+i{-, h) f {■)) 



^xMs:tif{-)) ^xMsM'i-)) 

= 1^/ [^x,f,t ■ ■ ■ Bx,f>,s+i{-, h)] - fj.ff [-Bx,s,t • • • B^^s^s+i{-, h)] I 

with, for any A £ B{X), fj,f{A) =^ (l)-^,t\s:t{'i-Af)/4>x,t\s:t{f)- Therefore, since for 
any probabilities fi and fi' on 5(X) and any measurable function ip, |/i ('(/') — 
/^'(V')! ^ osc {^p), ([7T]) shows that 

[B^^s,t ■ ■ ■ B^^s,s+ii-, h)] - fif [B^^s,t ■ ■ ■ h)] I < p^"" osc (h) . 

□ 

The goal of this section consists in establishing, under the assumptions men- 
tioned above, that the FFBS approximation of the marginal fixed interval smooth- 
ing probability satisfies an exponential deviation inequality with constants that 
are uniform in time. 

The first step in the proof consists in showing a time-uniform deviation in- 
equality for the auxiliary particle filter. Here again, the proof of this result could 
be adapted from 0, Section 7.4.3]. For the sake of clarity, we present a self- 
contained proof, which is valid under assumptions that are weaker than those 
used in [1, Chapter 7]. 
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Proposition 6. Assume that 421-0 ^^^'^ T = oo. Then, the filtering 
distribution satisfies a time-uniform exponential deviation inequality, i.e. there 
exist constants B and C such that, for all integers N and t > 0, all measurable 
functions h and all e > 0, 



N 



N-'Y.^lh{il) 



4>x,t\t-i{9th) 



i=l 



> e 



> e 



(72) 
(73) 



Proof. We first prove (i73|) . Without loss of generality, we will assume that 
^x>*|t(^) ~ Similar to [1, Eq. (7.24)], the quantity 4>x,t\t{h) is decomposed as. 



+ 



^o,t(l) 



where 



(74) 



(75) 



We first establish some exponential inequality for i?o,t(^)/-Bo,t(l) where the de- 
pendence in t will be explicitly expressed. For that purpose, we will apply Lemma 
[TT] by successively checking Conditions ^ , dlT]) , and , with 

QN =^ BQ^t{h) 
bN = Bo,t{l) 
CN = 



def 



Ix{dxo)go{xo)jt^ 



Under the strong mixing condition AlH it may be shown that (see 0, chapter 4] 
or 0, section 4.3.3]) 



which implies that b> j3. Since (t>x,t\t{^) = 0, Eqs. ([^T]) and ([69]) imply 



(76) 



CLN 




Bo,t{h) 


bN 




BoA^) 



(Px,t\tW 



Eili u^h^oA^l 1) / x{dxo)go{xo)LoAxo, 1) 



</9*osc(/i) . (77) 



This shows condition (I) with M = p'' \ h\^. We now turn to condition (II). We 



have 



TV 



i=l 



io,t(-,i)L 



po(dxo)wo(xo)5'o(2;o) 



l^o,t(-,i)L 
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Since 



< 



^ I Wo loo, we have by HoefFding's inequality 



P [\bN -b\>e]<B exp (^-CNt 

where the constants B and C do not depend on t. This shows condition (II). We 
now check condition (III). We have 



N 



UN - {cN/dN)bN = CLN = N ^^UJq 



1=1 



^o,t(-,i)L 



Now, as 4>^,t\t{h) = imphes / x(dx)Lo,t(x, /i) = 0, it holds that E(aAr) = 0. 
Moreover, 



^01 



l^o,i(-,l) 



^o,f(a,i) (loMm 
« Uo,t(a,i) 



{h) 



^ I'^oloo — /0*OSc(/l) 



using (j3T|) and ([69|) . Condition (III) follows from HoefFding's inequality. Then, 
Lemma [11] gives 



'[|5o,t(/i)/5o,i(l)| >e] <Se 



-C7Ve2/(p' osc(/i))2 



where the constants B and C do not depend on t. 

We now consider for 1 < s < i the difference i?s,t(^)/^s,t (1) — -Bs-i.t(^)/-Bs-i.t(l) 
where -B^^j is defined in (j75p . We again use Lemma [TT] where P(-) = P (• | JT^.i), 

QAT = -Bs,t(/l), &Ar = -Bs,t(l), CAT = -Bs-l,t(/i), = -Bs-l,t(l), 



Eili / Q(^Li, dx)5.(x)Lo,s(x, 1) 



io,s(-,l)LEf=ia;f-i^.(ei) 



, and /? 



C_(7_ 



where cr_ and c_ are defined in (j66j) and (|67|) , respectively. It appears using ([76 
and (t^ that b> (3. Moreover, 



OAT 


CN 










-iLs-lACl- 




bN 


dN 






,t(e,l) 




-iLs^lACs^ 


-1,1) 



< /> osc (h) , 
(78) 

showing condition ([T[) with M = ^osc{h). We now check condition ([IT]) . By 
([26[). we have 



TV 



5Ar 



6 = iV-^E 



i=l 



I^m(-,i)L 



E 



1 LsA^sA) 



Thus, since |a;*Ls,i(^s, l)/|-^^s,t(-, l)!^^! ^ supjwjioo cr+Zo"-, we have by condi- 
tional Hoeffding's inequality 
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showing condition (jll]) with constants which do not depend on s. Moreover, write 
«^ - l^^N = N-^ Zti rf where 



Ef=i^_iLs-i,t(e-i,i) V 'I^m(->i)Ic 



Since {(^t , are i.i.d. conditionally to the cr-field J^t-i, we have that 
^ are also i.i.d. conditionally to Tt-i- Moreover, it can be easily checked 
using (|26p that E [r/^ | .Ft_i] = 0. In order to apply the conditional Hoeffding 
inequality, we need to check that i]^ is bounded. In fact, using (f^TI) and (f69]) . 



Consequently, 



CN , 
UN OAT 



t~s 



OSC (/l) 



> e 



^=1 



> e 



< S exp <^ -CN 



/3* OSC (/l) 



where the constants B and C do not depend on s. This shows condition piip . 
Finally by Lemma [TT| 



> e 



Ts-\ < Bexp 



BsA^) Bs^iAl) 
The proof is concluded by using Lemma 



-CN 



P* OSC (/l) 



□ 



We now show that the time uniform deviation inequality for the filtering 
estimator extends, under the mixing assumption (AH]) on the Markov kernel Q, 
to the FFBS smoothing estimator. The key result to establish a time uniform 
bound for the FFBS smoothing estimator is the following Proposition, which 
establishes the uniform ergodicity of the particle approximation of the backward 
kernel. 

Lemma 7. Assume Then, for any probability distributions and ^l' on 
the set {!,..., A''}, any integers < s < t and any function h on {1, ... , N}, 



N 



is:t = l 



< OSC (h) p* 



where vo\"^ is defined in (fT5]) . 



Proof. For u e {s + 1, . . . ,t}, define Wu the N x N matrix with entries 



l<i,j <N . 



imsart-aos ver. 2007/12/10 file: dgarm.tex date: April 2, 2009 



PARTICLE APPROXIMATION OF SMOOTHING DISTRIBUTIONS 



23 



The matrix Wu can be interpreted as the Markov transition matrix of a non- 
homogeneous Markov chain on the state-space {1, . . . , N} (which may be seen as 
the particle approximation of the backward kernel ([2])). Using this notation, for 
any probability distribution on {1, ... , N} and any function /i on {1, ... , N}, 
the sum X]i^.t=i ^(^sj'^s^tVC**) t>e interpreted as the expectation of the 

function h under the marginal distribution at time t — s of a non-homogeneous 
Markov chain started at time from the initial distribution fi and driven the 
transition matrix Wt, Wt-i, . . . : 

N N 

«s:t = l is:t = i 

Under (AH]), the entries of these transition kernels are lower-bounded by u-jo^. 
Therefore, the Dobrushin coefficient of each transition matrix u G {s -|- 
1, . . . , t} is upper bounded by p (see [1]). The result follows. □ 

We then show that the existence of time-uniform exponential deviation in- 
equality for the auxiliary particle filter approximation of the filtering distribution 
extends to the FFBS smoothing estimator. 

Theorem 8. Assume ^4[2F0 ^(^^d, with T = oo. Then, there exist constants 
< B, C < oo such that for all integers N , s, and T, s <T, all e > 0, 

KmtW - 0x,^|t(^)| > < i?e-^^^'/°-'('^) , (80) 
where 4>x,s\T^^) ^'^^ 4'x,s\Tih) are defined in (fTll) and (f2T]) . 

Proof. (j80p follows from (j79p along the same lines as in Theorem [2j We 
use the notations of Theorem [2j Let h he a. function defined on X and s, T be 
positive integers such that s < T. Without loss of generality, we assume that 
4>x,s\T{h) = 0. We will denote by 

h : {xs, ...,xt)^ h{xs) . (81) 

For s S {0, . . . , r}, consider again the following decomposition 

1 (h) = = As,s,T(h) ^ i As,t,T(h) A,,t-i,r(/t) l 

^X,s\tW ^^^^^^(-^^ ^^^^^^(-L) + ^ Z.^ I As,t-1,T{1) j ' 



where h, As^s,T, and ^s,t,r are defined in (jSTI) . (jSTI) and (p3[) . respectively. Note 
that h depends of only through its first component S^l" ; therefore, it follows 
from the definition ([Ml) of Ls,t,T that Ls,t,T{C:t = h{Cs')Lt,t,T{€t' A)- In 
particular, Ls,t,T{C-tA) = Lt,t,T{Ct' A)- 

We first consider the term As^s,T{h) / As^s,t{^)- It follows from the definition that 

N N 
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We apply Lemma [TT] with 

6^ = 0,-i^,,,t(1)/IW(->1)Ioc 

CAT = 

b = (I)^^s\s[Ls,s,t{-, 1)]/ \Ls,s,t{-, 1)L 

J = (^~/(^+ ■ 

Using the definition ([29]) and (Ag]), for any s e {0, . . . , T}, 

'7+ 



(82) 



Therefore h > (3. Then, note that \aN /hN\ < l^loo! therefore condition dH) is 
satisfied with M = \h\^. We now check condition We have 



InequaUties (I73p and ()82p show that there exists constants i? and C such that 
for any e > and ah positive integers s <T, 

Hence, condition (jll]) is satisfied. Moreover, 



a^-^JLi,^ = aN = n-'jZ^iGs{ii), where Gs{i) = h{i)^^^^^^^^^ 



Using the definition ()29p of Ls,s,T 



(h) 



'Px,s\s[H-)Ls,s,Ti-, 1)] 
4'x,s\s[Ls,s,T{-,'i-)] 

The condition (t>x-s\T{h) = therefore imphes that (l)^^s\s{Gs) = 0. On the other 
hand, using ([82]) . \Gs\ao < \^oo'^+/'^~■ Hence, by ([73]) . 



CAT , 
Oat — 3— Oat 

(In 



> e 



for some B and C which do not depend on s nor T. Hence condition OHh is 
satisfied. Combining the result above, Lemma [11] therefore shows that. 



> e 



< 2Bexp [-CNe^/ osc^{h) 



We now consider the term As,t,T(/i)Ms,t,T(l) - ^s,t-i,T(/i)Ms,t-i,T(l) for t> s. 
For that purpose, we use Lemma [TT] with 

' aN = N-^ As,t,T{h)/ \Lt,t A- ^'^)\oo 
6^ = iV-M,,i,r(l)/|L4,i,T(-,l)L 

CAT = As^t-l,T{h) 

cLn = As,t-i,T(l) 



_l|,_l[Li_i,t_i,T(-, l)]/(|Li,i,T(-, 1)L '/'x,*-i|t-i(^t)) 



[/3 = c_a_/(a+|.?.L) 
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Inequality (182)) and (l67l) directly imply that h> j3. Moreover 
hN (In 



N 



EMC 



ls:t-l 



(83) 

where lJ-t~ii-,S,) ^-^id are two probability distributions on the set {1, ... , N} 

defined as 



^t-i(^,0 



and fi't^iii) 



It follows from Lemma [7] that, for all 



^t-l-^t-l,f~l,T(CLl> 1) 

(84) 



N 



< osc (/i) p 



t-l-s 



which in turn implies that 



bN dN 



< osc (/i) p 



t-l-s 



(85) 



showing condition (|T]) with M = osc (h) p^ ^ ^. We now consider the condition 
(IIIl) . It follows from the definition of that: 



\Lt,t,T{-, ■ 



By dlS]) and ((82 



P[|67V - fe| > e] < Be 



where the constants B and C do not depend on the time indexes t and T. This 
relation shows condition (jll]). We finally consider condition (jllip . Using ()83p . 



OAT 



6Ar = J7,-'EiIi^?Gi(^l), where 



G*(e)= E hm^-i 

«s:t-l = l 

[ a^;!:j^g(et^,0 u;;!:j^L,_i,,_i,T(g;!:i^,i) 1 L,,,,T(g,i) 
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Using that 



E 



N 



1=1 



t-l 



fi / q{Ct-i,x)gt{x)Lt,t,T{x, l)dx 



t-l,t-l,T 



t-l A) 



ElM-iMCLi) 



ElM-iMCLi) 



and 



E 



Etia;|-iL,-i,t-i.T(Cti,l) 



it follows that E {ijj\Gt(^i\) \ -^t-i] = 0. On the other hand, using Lemma [7] (with 
[I and [j! defined in ([8ij) ) and (f82|) . |Gt(OI < p*""*"^ osc (/i). We may therefore 
apply the Hoeffding inequality to show that 



N 



i=l 



> e 



< 2exp 



2Ne^ 



2(/l) p2{t-s-l) 



osc 



showing that condition piip is satisfied with constants that do not depend on t. 
Combining these results, Lemma [11] shows that, there exists some constants B 
and C, such that, for all s < t, 



As,t,Tih) A 



s,t-l,T 



(h) 



> e 



< S exp -CN 



^,t,T(l) ^,t-l,T(l) 

The proof is concluded by applying Lemma [T4l 



p2{t-s-l) osc2(/l) 



□ 



6. A limiting expression of the variance of the marginal smoothing 
distribution. In this section, we study the expression of the variance (j49p for 
the FFBS approximation of the marginal smoothing distribution. In particular, 
we show that under the strong mixing condition the asymptotic variance 

of the marginal smoothing estimator Ty- g./piT [h] , where h is defined in ([8T|) has 
a finite limiting value has T ^ oo for a given value of s. We will also show that 
this variance is upper bounded uniformly in time, allowing to construct uniform 
confidence intervals. 

The first step consists in showing that the asymptotic variance of the auxiliary 
particle filter has a finite limiting value as T ^ oo, and deriving an upper-bound 
for this limit. 

Proposition 9. Assume hold for T = oo. Then, with the notations 

of Proposition O 

Tx,s\s W < sup \u;r\oc Wrloc T~~^ OSC^{h) . 
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Proof. Without loss of generality, we assume that (p^^s\s{f^) = 0- Note that, 
for any r G {0, . . . , s}, under (AH]), 



J Q{x, dx')gr {x')Lr^s ^ a+ 

'Ax,r|r-lbr(-)^r,s (•,!)] ~ Cr~ 



Lr,s{Xi 1) '/'xi^k 

[ir,s(-,l)] 



(86) 
(87) 



We now bound r = . . . s, defined (jH]) and (|15|) . First, for r = 0, using 

that x{Lo,si-, h)) = 0, Proposition [5] and inequality (f76l) show that 



Po(^e(-)iL(-») _ X (;§;(•) [5o(-)^o,s(-»]') 



((Ax.Ohl [5o(-)^0,s(-, 1)]) (0x,Ohl bo(-)^0,s(-, 1)]) 

Lo4;l) X(^0,s(-,1)) 



dx ^ f 5o(-)-^o,s(-, 1) 



dpo 1 xbo(-)-^o,s(-,l)] 



xigo) cr- 



C- a- 



Similarly, for r > 0, using that (j)^^j.^j.[Lr^s{'T h)] = 0, Eqs. ([86]) and ([87|) show 
that 



( • ) / ( • , a;)a;^ ( • , x) (x , /i) dx 



^Xi''~l|''~l 



(0X,r-|r-l br(-)-^r,s(-,l)]^ 

(5(-, dx)5(,.(x)a;r(-, a^) x ... 



Lr,s{x, 1) 



Lr,s{x, 1) 



^X,r-|r- 



[^r,s(-,l)] 



< 



'Ax,r|r-lbr(-)] 



c_ cr_ 



which implies that < l^rloo ('^+/c- (^~)p'^^^' osc^(/i) |wrloo- '^^^ result 



follows. 



□ 



We are now in position to state and prove the main result of this section, 
which provides a uniform bound for the variance of the particle estimator of the 
marginal smoothing distribution. 

Theorem 10. Assume hold for T = oo. Then, for any s <T , 



osc^(/i) j 1 fa 



\C_ Vcr_/ r>0 



— ( — ) SUp\uJr\^\'&r\oo + ZTZ3-^^^'P\'^r\oo\^r\oo\9r\oo I ' 



C_CJ_ r->0 



where the function h and the covariance Px.sir [^] '^''"^ defined in (j81|) and (|49p . 
respectively. 
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Proof. We bound the summands appearing in ()49p . Consider first the term 
T^^s\s [I-'s,s,t{-, h)] /(j)'^ ^|^[Ls,s,t('5 !)]■ We first apply Proposition[9]to the function 
Ls,s,t{-, h)/<PxMs [Ls,s,t{-, !)]• Using ([76]) and Ls,s,t(-, h) = h{-)Ls,s,T{-, 1), 



Ls^s,t{x,1) 



< 



hence osc (^Ls^s,t{-, h)/4>x,s\s [Ls,s,t{-, 1)]) < ^ osc (/i), and 

< — — sup|w^|^|# I 



Now, we write 



We will show that 

(l)x,s:t-i\t-i[H-)qs,t-i{-,x')] Lt^t,T{x',l) _ a 



-dx' 



(t>x,t\t-i [9t{-)Lt,t,T{-, 1)] 
Using this inequality, 

'Px,t-i\t-i{'"s,t,T{-,h.))4'x,t~i\t~i{^t) 

'/'J,tii_ibt(-)iM,T(-,i)] 



< 



+ J 



G-C- 



p'-'osc{h) . (89) 



< 



< 



< 



^+ \ J2{t-s) 



osc (/l)^ 



oo 



where we used the Fubini Theorem in the last step. Let us finally turn to the 
proof of the inequality ([89]) . 



(I)^^s:t-l\t-l [H-)Qs,t-l{-,x')] Lt^t,T{x',l) 

4>x,t\t-l [9t{-)Lt,t,T{-, 1)] 
_ / <Px,s\s(^^'^)^(^s)ls,t-i{xs,xt-i)q{xt-i,x')Lt^t,Tix',l)dxt-i 
J (l)-^^s\sidxs)ls,t-i{xs, xt-i) {/ gt{xt)q{xt-i,Xt)Lt^t,T{xt, l)dxj} dxt-i 

The last expression can be written A x B with 

J (l)x,s\s(^^s)h{xs)ls,t~iixs,xt~i)q{xt-i,x')dxt-i 



A 



J (p^^sis{dxs)ls,t~iixs, xt^i)q{xt^i,x')dxt^i 
imsart-aos ver. 2007/12/10 file: dgarm.tex date: April 2, 2009 



PARTICLE APPROXIMATION OF SMOOTHING DISTRIBUTIONS 



29 



and 

^ _ I 4>^^s\s{dxs)ls,t-iixs, xt-i)q{xt-i, x')Lt^t,T{x' , l)dxt-i 

/ 'Px,s\sidxs)ls,t-iixs, xt-i) {J gt{xt)q{xt-i,xt)Lt^t,T{xt, l)dxj dxt-i ' 

We will bound these two terms separately. Since 4>x,s\t{^) = 0) 

/ (t>x,s\si'^^s)ls,t-l{Xs,Xt^i)Lt^i^t~l,T{xt~l, l)dxt_i 

Thus, by Proposition [H 

1^1 = l^-^'l < /?*-"osc(/i) . 

On the other hand, as q{xt-i,x') < cr+, as / gt{xt)q{xt-i, xt)dxt > c_ and as 
for every xt it holds that 

Lt^t,T{xt, 1) ^ o\_ 
Lt^t,T{x', 1) ~ (J+ ' 

B is upper-bounded by (T^/((7_c_). 

APPENDIX A: TECHNICAL RESULTS 



□ 



Lemma 11. Assume that a^, b^, cat, djv o-nd b are random variables such 
that there exist positive constants /3, -Bi, Ci, -B2, C2, M such that 

(I) \aN/bN - CN/dN\ < M, F-a.s. and b> f3, P-a.s. 
(II) For aUe>0 and all N>1,F{ [b^ - 6| > e |) < Bie''^'^'^ , 

(III) For alloO and all N >1, F{\aN - {cN/dN)bN\ > e) < 52e"^'^(^)' 



Then, 



Otv _ Cn_ 

bN dN 



>e]< B,e-^^^(w)' +i?2e-^^^(^^) 



Proof. Write 





Cm 


<b-' 


QN 


Cn 


bN 


dN 


bN 


dN 



\b-bN\ + b~^ 

<p-^M\b-bN\+p-^ 



CN , 

aN OAf 

dN 



CN , 

Oat —ON 

dN 



Thus, 



aN_ _ cn_ 
bN dN 



>e| C \\b-bN\ > 



It 
2M J 



CAT , 
On — ;— Oat 
dN 



and the proof follows. 



a.s. 



2M J 



□ 



Lemma 12. Let v be a measure and {An{x)} be a sequence of stochastic 
processes such that, 



1. for u-almost every x, An{x) — > a{x). 
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2. there exists a constant C and a v-integrahle function h, for v-almost every 
X, \An{x)\ < Ch{x). 

Then, J \An{x) — a{x)\v{dx) — >7v^oo in L^{i'). 

Proof. Under the stated assumptions K\A]\f{x) — a(x)| — > as goes to 
infinity z^-a.e.. On the other hand, ly-a.e., \Ai^(x) — a{x)\ < 2Ch{x). The proof 
fohows from the Fubini Theorem and the dominated convergence Theorem, 



E 



An{x) — a{x)\iy{dx) = / E \A]\f{x) — a{x)\ iy{dx) -^n- 



. 



□ 



Lemma 13. Assume that (J^Jj^ hold for some T. Let {T j^{x)} he a sequence 
of stochastic processes and v a function such that (i) there exists a constant 
C < oo such that, for all N , < v^o and \v\^ < C and (ii) for all M > 0, 

sup|^|<Af |TAr(x) - v{x)\ -^N->oo 0. Then, t < T, f^^"^ J2eLi^t'^ NiCf) -^n^oo 



Proof. Write n]^^j:f=i^t{'^Niet) - viCt)} = Sn,i + Sn,2, with Sn,-_ 



dcf 



n~^'E'lioo't{rN{et)-vmH\et\ < M} and Sn,2 = n-^'E'lic^'d-^Niet) - 

v{Ct)}^{\Ct \ > Since Sj\i,i < sup|^.|<jy^ |T7v(x) — v{x)\, assmnption (ii) im- 
phes that Siy,i —^n-^oo 0. On the other hand, Siy,2 < '^CQ^^ J2eLi'-^t'^{\Ct \ > 
M}. By Proposition H 17,-1 ^ti ^MlCf'l > ^^^^ 0x,<|t(l{l " I > ^})- 
the proof follows since liiaiM^oo 4>x,t\i ' I ^ '-' 

Lemma 14. Let {YnA}^=i be a triangular array of random variables such 
that there exist constants B > 0, C > and p, < p < 1 such that, for all n, 
i S {1, . . . , n} and e > 0, 

^{\Yn,i\>e)<Be-^'"p-"' . 
Then, there exists B and C such that, for any n and e > 0, 



i=l 



> e <Be 



Proof. Denote by S Yl^i V^P*- It is plain to see that 



Set eo > 0. The proof follows by noting that, for any e > eq, 

n 



=1 



□ 
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